Näıve Nonparametric Bootstrap Model Weights
نویسندگان
چکیده
The plausibility of competing statistical models may be assessed using penalized log-likelihood criteria such as the AIC, which is given by AIC = −2lnL + 2k (L being the maximum likelihood estimate and k the number of free parameters). The raw AIC values can be transformed to AIC model weights by wi = exp(− 2∆AICi)/ ∑R r=1 exp(− 2∆AICr), where ∆AICi = AICi − min(AIC) and R is the total number of candidate models (e.g., Burnham and Anderson, 2001). Recent work in statistical biology has suggested that model weights can also be obtained from the nonparametric bootstrap (e.g., Buckland, Burnham, and Augustin, 1997). The nonparametric bootstrap method samples, with replacement, n values from the observed data x = (x1, x2, . . . ,xn) to obtain M bootstrap replications {x∗(1), . . . ,x∗(M)}. After the competing models are fit to each of the M replications, the average weight method (e.g., Burnham and Anderson, 2002, p. 172) calculates AIC weights for each bootstrap sample and then takes the average of these weights. The selection frequency method (e.g., Buckland et al., 1997) constructs an AIC weight for model i by determining the proportion of M samples in which model i has the lowest (i.e., preferred) raw AIC value. Despite the recent popularity of nonparametric bootstrapping of goodness-of-fit criteria, it should be noted that both näıve bootstrapping schemes are biased and can render misleading results. This is best illustrated by a simple example for which the correct sampling distribution is known. Consider a logistic model for the mortality rate (binomial coefficient μ) of a simulated beetle, Tribolium digitalis. The model has CS2 dosage (dose = {0, 1, 2, 4, 8, 16}) and gender (male = 1, female = 0) as predictors: μ = {1 + exp[−(α + β · dose + γ · gender)]}−1. At each level of dose, mortality rates of 10 male and 10 female beetles were recorded. In our simulations, we defined the generating model for the population to be α = −2, β = 2 , γ = 0 (i.e., no gender effect). We sampled K = 500 independent data sets from this population model, and fitted to each data set both the true γ = 0 model and the less parsimonious model in which γ is a free parameter. As shown in the top left panel of Figure 1, the distribution of −2(lnLγ=0 − lnLγ=free) closely approximates the χdf=1 distribution expected according to theory. The top right panel shows the average sampling distribution obtained when the nonparametric bootstrap is applied to the same K = 500 independent samples, each sample in turn creating its own bootstrap distribution with M = 500 replications. The difference between the two distributions is striking. Analytically, the expected value of the nonparametric bootstrap distribution is asymptotically equal to 2, whereas the expected value for the χdf=1 distribution is 1 (Bollen and Stine, 1992). The reason for the failure of the näıve bootstrap is that for a particular sample the null-hypothesis (i.e., γ = 0) does not hold exactly (cf. Bollen and Stine, 1992). This disparity between the theoretical sampling distribution and the bootstrap sampling distribution has profound implications for the computation of model weights. The bottom left panel of Figure 1 shows the distribution of model weights for the true γ = 0 model, based on the same K = 500 samples that yielded the approximate χdf=1 distribution in the top left panel. Note that the maximum weight for the γ = 0 model is e/(e + 1) ≈ 0.731, since its AIC value can only be 2 better than that of the model with γ free. The bottom right panel shows the distribution of weights resulting from the nonparametric average weight method, which consistently yields lower AIC weights for the true γ = 0 model than expected based on theory. As regards the nonparametric selection frequency method, the mean selection frequency for the γ = 0 model is about 0.681, and this is substantially lower than the selection frequency expected according to theory based on the χdf=1 distribution (i.e., ∫ 2 0 χ 2 df=1 ≈ 0.843, since AIC values are equal when −2(lnLγ=0 − lnLγ=free) = 2). The demonstration that the näıve nonparametric bootstrap yields model weights that are biased against the simple model can have at least two negative consequences. First, if model weights are used to quantify evidence, the plausibility of the complex model will be overestimated. Second, model averaged inference quantifies the contribution of each model-by-model weights. Nonparametric bootstrap weights will spuriously increase the impact of the complex model, and this will hurt inference because parameter estimates for complex models are more variable
منابع مشابه
Statistical Topology Using the Nonparametric Density Estimation and Bootstrap Algorithm
This paper presents approximate confidence intervals for each function of parameters in a Banach space based on a bootstrap algorithm. We apply kernel density approach to estimate the persistence landscape. In addition, we evaluate the quality distribution function estimator of random variables using integrated mean square error (IMSE). The results of simulation studies show a significant impro...
متن کاملNonparametric Estimation of Spatial Risk for a Mean Nonstationary Random Field}
The common methods for spatial risk estimation are investigated for a stationary random field. Because of simplifying, lets distribution is known, and parametric variogram for the random field are considered. In this paper, we study a nonparametric spatial method for spatial risk. In this method, we model the random field trend by a local linear estimator, and through bias-corrected residuals, ...
متن کاملBootstrap Confidence Bands for the Autoregression Function
We derive a strong approximation of a local polynomial estimator (LPE) in nonparametric autoregression by an LPE in a corresponding nonparame-tric regression model. This generally suggests the application of regression-typical tools for statistical inference in nonparametric autoregressive models. It provides an important simpliication for the bootstrap method to be used: It is enough to mimic ...
متن کاملNonparametric Regression Estimation under Kernel Polynomial Model for Unstructured Data
The nonparametric estimation(NE) of kernel polynomial regression (KPR) model is a powerful tool to visually depict the effect of covariates on response variable, when there exist unstructured and heterogeneous data. In this paper we introduce KPR model that is the mixture of nonparametric regression models with bootstrap algorithm, which is considered in a heterogeneous and unstructured framewo...
متن کاملA Note on Bootstrap Moment Consistency for Semiparametric M-Estimation
The bootstrap variance estimate is widely used in semiparametric inferences. However, its theoretical validity is a well known open problem. In this note, we provide a first theoretical study on the bootstrap moment estimates in semiparametric models. Specifically, we establish the bootstrap moment consistency of the Euclidean parameter which immediately implies the consistency of t-type bootst...
متن کامل